186 research outputs found
Adaptive, fast walking in a biped robot under neuronal control and learning
Human walking is a dynamic, partly self-stabilizing process relying on the interaction of the biomechanical design with its neuronal control. The coordination of this process is a very difficult problem, and it has been suggested that it involves a hierarchy of levels, where the lower ones, e.g., interactions between muscles and the spinal cord, are largely autonomous, and where higher level control (e.g., cortical) arises only pointwise, as needed. This requires an architecture of several nested, sensori–motor loops where the walking process provides feedback signals to the walker's sensory systems, which can be used to coordinate its movements. To complicate the situation, at a maximal walking speed of more than four leg-lengths per second, the cycle period available to coordinate all these loops is rather short. In this study we present a planar biped robot, which uses the design principle of nested loops to combine the self-stabilizing properties of its biomechanical design with several levels of neuronal control. Specifically, we show how to adapt control by including online learning mechanisms based on simulated synaptic plasticity. This robot can walk with a high speed (>3.0 leg length/s), self-adapting to minor disturbances, and reacting in a robust way to abruptly induced gait changes. At the same time, it can learn walking on different terrains, requiring only few learning experiences. This study shows that the tight coupling of physical with neuronal control, guided by sensory feedback from the walking pattern itself, combined with synaptic learning may be a way forward to better understand and solve coordination problems in other complex motor tasks
Cluster update and recognition
We present a fast and robust cluster update algorithm that is especially
efficient in implementing the task of image segmentation using the method of
superparamagnetic clustering. We apply it to a Potts model with spin
interactions that are are defined by gray-scale differences within the image.
Motivated by biological systems, we introduce the concept of neural inhibition
to the Potts model realization of the segmentation problem. Including the
inhibition term in the Hamiltonian results in enhanced contrast and thereby
significantly improves segmentation quality. As a second benefit we can - after
equilibration - directly identify the image segments as the clusters formed by
the clustering algorithm. To construct a new spin configuration the algorithm
performs the standard steps of (1) forming clusters and of (2) updating the
spins in a cluster simultaneously. As opposed to standard algorithms, however,
we share the interaction energy between the two steps. Thus the update
probabilities are not independent of the interaction energies. As a
consequence, we observe an acceleration of the relaxation by a factor of 10
compared to the Swendson and Wang procedure.Comment: 4 pages, 2 figure
Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions
Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult
Reinforcement learning or active inference?
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain
Coverage, Continuity and Visual Cortical Architecture
The primary visual cortex of many mammals contains a continuous
representation of visual space, with a roughly repetitive aperiodic map of
orientation preferences superimposed. It was recently found that orientation
preference maps (OPMs) obey statistical laws which are apparently invariant
among species widely separated in eutherian evolution. Here, we examine whether
one of the most prominent models for the optimization of cortical maps, the
elastic net (EN) model, can reproduce this common design. The EN model
generates representations which optimally trade of stimulus space coverage and
map continuity. While this model has been used in numerous studies, no
analytical results about the precise layout of the predicted OPMs have been
obtained so far. We present a mathematical approach to analytically calculate
the cortical representations predicted by the EN model for the joint mapping of
stimulus position and orientation. We find that in all previously studied
regimes, predicted OPM layouts are perfectly periodic. An unbiased search
through the EN parameter space identifies a novel regime of aperiodic OPMs with
pinwheel densities lower than found in experiments. In an extreme limit,
aperiodic OPMs quantitatively resembling experimental observations emerge.
Stabilization of these layouts results from strong nonlocal interactions rather
than from a coverage-continuity-compromise. Our results demonstrate that
optimization models for stimulus representations dominated by nonlocal
suppressive interactions are in principle capable of correctly predicting the
common OPM design. They question that visual cortical feature representations
can be explained by a coverage-continuity-compromise.Comment: 100 pages, including an Appendix, 21 + 7 figure
Motion processing with wide-field neurons in the retino-tecto-rotundal pathway
The retino-tecto-rotundal pathway is the main visual pathway in non-mammalian vertebrates and has been found to be highly involved in visual processing. Despite the extensive receptive fields of tectal and rotundal wide-field neurons, pattern discrimination tasks suggest a system with high spatial resolution. In this paper, we address the problem of how global processing performed by motion-sensitive wide-field neurons can be brought into agreement with the concept of a local analysis of visual stimuli. As a solution to this problem, we propose a firing-rate model of the retino-tecto-rotundal pathway which describes how spatiotemporal information can be organized and retained by tectal and rotundal wide-field neurons while processing Fourier-based motion in absence of periodic receptive-field structures. The model incorporates anatomical and electrophysiological experimental data on tectal and rotundal neurons, and the basic response characteristics of tectal and rotundal neurons to moving stimuli are captured by the model cells. We show that local velocity estimates may be derived from rotundal-cell responses via superposition in a subsequent processing step. Experimentally testable predictions which are both specific and characteristic to the model are provided. Thus, a conclusive explanation can be given of how the retino-tecto-rotundal pathway enables the animal to detect and localize moving objects or to estimate its self-motion parameters
Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison
A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications
- …